In contrast to the control-theoretic methods, the lack of stability guarantee remains a significant problem for model-free reinforcement learning (RL) methods. Jointly learning a policy and a Lyapunov function has recently become a promising approach to ensuring the whole system with a stability guarantee. However, the classical Lyapunov constraints researchers introduced cannot stabilize the system during the sampling-based optimization. Therefore, we propose the Adaptive Stability Certification (ASC), making the system reach sampling-based stability. Because the ASC condition can search for the optimal policy heuristically, we design the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm based on the ASC condition. Meanwhile, our algorithm avoids the optimization problem that a variety of constraints are coupled into the objective in current approaches. When evaluated on ten robotic tasks, our method achieves lower accumulated cost and fewer stability constraint violations than previous studies.
translated by 谷歌翻译
逆合合成是一种将分子转化为潜在反应物的过程,因此鉴定了合成途径。我们提出了一个新颖的生成框架,称为$ \ mathsf {g^2retro} $,用于一步回曲预测。 $ \ mathsf {g^2retro} $模仿合成反应的反向逻辑,也就是说,首先预测反应中心以将靶分子转换为名为合成的片段,然后将合成剂转化为反应剂,然后按照先前的基于半电压的方法转换为反应剂。在预测反应中心时,$ \ mathsf {g^2retro} $定义了一组全面的反应中心类型,并通过考虑多个反应中心候选者来实现预测反应的多样性。在完成合成子时,$ \ mathsf {g^2retro} $部署了一系列子结构附件,以将合成物转换为反应物,该反应物利用了要完成的合成结构的最新结构的整体视图,以及所有所涉及的合成物和所有合成的结构产品结构。在这里,我们证明$ \ mathsf {g^2retro} $能够更好地对基准数据集中最可能的反应物进行优先级,而不是最先进的方法,并且发现了不包括在该方法中基准数据集。
translated by 谷歌翻译